Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Conversation

gevtushenko
Copy link
Collaborator

In order to fix the issue I've reduced the comparisons count in case of partially filled tile. The issue is caused by the fact that each thread has a fixed amount of data to process. If the input data size isn't a multiple of ITEMS_PER_TILE, there isn't enough data to fill the local storage of each thread. In this case, extra elements of the storage are filled with some fixed value. After that the storage is sorted by stable_odd_even_sort by each thread (even ones outside the input range). The idea behind the fix is in skipping stable_odd_even_sort for threads of the last tile with no actual input data because in this case their storages are filled with the fixed value.

The fix doesn't affect performance even for small input data sizes and complex comparators. For input data size of 2 INT32 elements number of comparisons is reduced by 507 times (RTX3090).

@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@alliepiper
Copy link
Collaborator

add to allowlist

@gevtushenko
Copy link
Collaborator Author

run tests

@alliepiper alliepiper added this to the 1.13.0 milestone May 26, 2021
@gevtushenko gevtushenko force-pushed the main-feature/github/thrust_merge_sort_comparisons_count_reduction/1436 branch from 1f59b78 to 403829e Compare May 26, 2021 20:11
@alliepiper
Copy link
Collaborator

DVS CL: 30008797

run tests

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: gpuCI in progress Started gpuCI testing. testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels May 26, 2021
@alliepiper alliepiper self-assigned this Jun 2, 2021
@alliepiper alliepiper merged commit 1865104 into NVIDIA:main Jun 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too many comparisions during calling to thrust::sort and thrust::stable_sort
3 participants